Skip to content

prov/efa: Fix FI_INJECT in efa direct#12146

Merged
jiaxiyan merged 2 commits into
ofiwg:mainfrom
jiaxiyan:inject_hmem
Apr 17, 2026
Merged

prov/efa: Fix FI_INJECT in efa direct#12146
jiaxiyan merged 2 commits into
ofiwg:mainfrom
jiaxiyan:inject_hmem

Conversation

@jiaxiyan
Copy link
Copy Markdown
Contributor

@jiaxiyan jiaxiyan commented Apr 15, 2026

Return -FI_EOPNOTSUPP for FI_INJECT when EFA device cannot use inline send.
Replace the assert in fi_inject/fi_injectdata. Fail with -FI_EOPNOTSUPP instead.

@jiaxiyan jiaxiyan requested a review from a team April 15, 2026 23:04
Comment thread prov/efa/src/efa_msg.c Outdated
@a-szegel
Copy link
Copy Markdown
Contributor

Does inject + write handle this correctly?

@jiaxiyan
Copy link
Copy Markdown
Contributor Author

Does inject + write handle this correctly?

Inline write is implemented as part of wide wqe #11944

…_initialization

The unit test binary is a single persistent process — all tests
share the same efa_device with its qp_gen_table.
Every test that creates an endpoint with data path direct enabled
increments qp_gen_table[qpn_slot].
gen is a uint8_t that increments on every QP creation for that
QPN slot across the entire test process. It will inevitably wrap to
0 after 256 QP cycles.

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
@jiaxiyan jiaxiyan changed the title prov/efa: Fix FI_INJECT with HMEM in efa direct prov/efa: Fix FI_INJECT in efa direct Apr 16, 2026
Comment thread prov/efa/src/efa_msg.c
Comment thread prov/efa/src/efa_msg.c Outdated
Comment thread prov/efa/src/efa_msg.c Outdated
Comment thread prov/efa/src/efa_msg.c Outdated
"FI_INJECT is requested but message size of "
"%zu exceeds efa-direct inject_msg_size of "
"%zu, "
"or desc is on device memory: %d.\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do u think two warning messages based on two if would be better? like

if (len > base_ep->domain->device->efa_attr.inline_buf_size)
   EFA_WARN("msg size exceeds inject size")
if (msg->desc && efa_mr_is_hmem(msg->desc[0]))
  EFA_WARN("inject/FI_INJECT is not supported for FI_HMEM memory\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help the user out; tell them which thing was wrong.

shijin-aws
shijin-aws previously approved these changes Apr 16, 2026
@shijin-aws shijin-aws requested a review from sunkuamzn April 16, 2026 20:36
Comment thread prov/efa/src/efa_msg.c Outdated
}
} else {
if (flags & FI_INJECT) {
if (len > base_ep->inject_msg_size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're duplicating the check

Can we do the check once and handle both use_inline and flags & FI_INJECT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. What is your suggested change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

bool len_less_than_inline = len <= base_ep->domain->device->efa_attr.inline_buf_size;
bool is_hmem = msg->desc && efa_mr_is_hmem(msg->desc[0]);

if (len_less_than_inline && !is_hmem) {
    use_inline = true;
} else {
    if (flags & FI_INJECT) {
        if (len_less_than_inline) // print error for length
        else // print error for hmem
    }
}       

Return -FI_EOPNOTSUPP for FI_INJECT when EFA device cannot use inline send.
Replace the assert in fi_inject/fi_injectdata, fail with -FI_EOPNOTSUPP instead.

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
@jiaxiyan jiaxiyan merged commit a2f162a into ofiwg:main Apr 17, 2026
22 checks passed
darrylabbate added a commit to darrylabbate/libfabric that referenced this pull request Apr 29, 2026
…d_prep

The post_send mock queued via will_return_int_always is not consumed
by tests that override g_efa_unit_test_mocks.efa_qp_post_send with a
different mock. cmocka then reports the queued value as a leak and
fails the test with:

  [  ERROR   ] --- %s() has remaining non-returned values.
  prov/efa/test/efa_unit_test_msg.c: remaining item was declared here

Use will_return_int_maybe instead so the queued value may go unused
when a test installs its own post_send mock.

This is a targeted extract of upstream commit a2f162a (ofiwg#12146,
"prov/efa: Fix FI_INJECT in efa direct"), limited to the single
will_return_int_maybe change needed as a prerequisite for the
test_efa_msg_sendmsg_multi_iov_second_desc_hmem_fails test added by
the v2.4.x backport of ofiwg#12163.

Signed-off-by: Darryl Abbate <drl@amazon.com>
darrylabbate added a commit that referenced this pull request Apr 29, 2026
…d_prep

The post_send mock queued via will_return_int_always is not consumed
by tests that override g_efa_unit_test_mocks.efa_qp_post_send with a
different mock. cmocka then reports the queued value as a leak and
fails the test with:

  [  ERROR   ] --- %s() has remaining non-returned values.
  prov/efa/test/efa_unit_test_msg.c: remaining item was declared here

Use will_return_int_maybe instead so the queued value may go unused
when a test installs its own post_send mock.

This is a targeted extract of upstream commit a2f162a (#12146,
"prov/efa: Fix FI_INJECT in efa direct"), limited to the single
will_return_int_maybe change needed as a prerequisite for the
test_efa_msg_sendmsg_multi_iov_second_desc_hmem_fails test added by
the v2.4.x backport of #12163.

Signed-off-by: Darryl Abbate <drl@amazon.com>
darrylabbate added a commit to darrylabbate/libfabric that referenced this pull request May 6, 2026
…d_prep

The post_send mock queued via will_return_int_always is not consumed
by tests that override g_efa_unit_test_mocks.efa_qp_post_send with a
different mock. cmocka then reports the queued value as a leak and
fails the test with:

  [  ERROR   ] --- %s() has remaining non-returned values.
  prov/efa/test/efa_unit_test_msg.c: remaining item was declared here

Use will_return_int_maybe instead so the queued value may go unused
when a test installs its own post_send mock.

This is a targeted extract of upstream commit a2f162a (ofiwg#12146,
"prov/efa: Fix FI_INJECT in efa direct"), limited to the single
will_return_int_maybe change needed as a prerequisite for the
test_efa_msg_sendmsg_multi_iov_second_desc_hmem_fails test added by
the v2.5.x backport of ofiwg#12163.

Signed-off-by: Darryl Abbate <drl@amazon.com>
shijin-aws pushed a commit that referenced this pull request May 7, 2026
…d_prep

The post_send mock queued via will_return_int_always is not consumed
by tests that override g_efa_unit_test_mocks.efa_qp_post_send with a
different mock. cmocka then reports the queued value as a leak and
fails the test with:

  [  ERROR   ] --- %s() has remaining non-returned values.
  prov/efa/test/efa_unit_test_msg.c: remaining item was declared here

Use will_return_int_maybe instead so the queued value may go unused
when a test installs its own post_send mock.

This is a targeted extract of upstream commit a2f162a (#12146,
"prov/efa: Fix FI_INJECT in efa direct"), limited to the single
will_return_int_maybe change needed as a prerequisite for the
test_efa_msg_sendmsg_multi_iov_second_desc_hmem_fails test added by
the v2.5.x backport of #12163.

Signed-off-by: Darryl Abbate <drl@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants